Code
# load in data
library(tidyverse)
library(gganimate)
library(gifski)For our final project, we chose to focus on CO2 emissions per capita and life expectancy. We use data sets taken from Gapminder, which have been combined and collected from a variety of sources. The co2_emissions data set contains 194 observations (countries) with 224 variables (ranging in year from 1800 to 2022). These data represent the recorded consumption-based CO2 emissions, in tonnes of CO2 per capita. The life_expectancy data set contains 195 observations (countries) with 302 variables (ranging in year from 1800 to 2100). These data represent the number of years a newborn infant would live assuming the mortality rate at their birth remains constant throughout their life.
# load in data
library(tidyverse)
library(gganimate)
library(gifski)# Data
co2_emissions <- read_csv(here::here("co2_pcap_cons.csv"))
life_expectancy <- read_csv(here::here("lex.csv"))
# Pivoting Longer
co2_emissions_long <- co2_emissions |>
mutate(across(`1800`:`2022`, ~ str_replace_all(.x, "−", "-"))) |>
mutate(across(`1800`:`2022`, ~ as.numeric(.x))) |>
pivot_longer(
cols = `1800`:`2022`,
names_to = "year",
values_to = "co2"
)
life_exp_long <- life_expectancy |>
mutate(across(`1800`:`2100`, ~ as.numeric(.x))) |>
pivot_longer(
cols = `1800`:`2100`,
names_to = "year",
values_to = "life_exp"
)
# Join Datasets
project_data <- co2_emissions_long |>
inner_join(life_exp_long,
by = c("country", "year")
) |>
mutate(year = as.numeric(year))When downloading our desired data sets, CO2 emissions per capita and life expectancy we received some warnings during our attempts to pivot our data sets. We came to learn that R was reading in the negative dashes as character variables as opposed to numeric, specifically in our CO2 emissions data.
To fix this issue, our group utilized str_replace_all (string replace) in order to replace the character variable with negative signs. This process was only applied for the CO2 emissions data.
We applied as.numeric for both data sets, and lastly pivoted long each data to receive our variables country, year and the data collected per data set.
In our model we will analyze CO2 emissions as the explanatory variable and life expectancy as the response variable. We expect a negative relationship between CO2 emissions and life expectancy, where countries that produce greater amounts of C02 tend to have lower life expectancy. We assume C02 emissions decrease air quality and cause respiratory issues, decreasing life expectancy.
# Visualization 1:
animated <- project_data |>
ggplot(aes(x = co2,
y = life_exp)) +
geom_point(alpha = 0.7, show.legend = FALSE,
na.rm = TRUE) +
labs(title = "CO2 Emisions vs. Life Expectancy per Year",
subtitle = "Year: {round(frame_time)}",
x = "CO2 Emisions (tonnes per capita)",
y = "Life Expectancy (years)") +
transition_time(year) +
ease_aes("linear") +
theme_bw()
animate(animated, renderer = gifski_renderer(), fps = 7)# Visualization 2:
project_data |>
group_by(country) |>
summarize(avg_co2 = mean(co2),
avg_life_exp = mean(life_exp)
) |>
ggplot(mapping = aes(x = avg_co2,
y = avg_life_exp)
) +
geom_point(alpha = 0.7,
na.rm = TRUE) +
labs(title = "Averge CO2 Emisions and Life Expectancy per Country",
x = "Average CO2 Emisions (tonnes per capita)",
y = "",
subtitle = "Average Life Expectancy (years)") +
theme_bw()# Visualization:
project_data |>
group_by(country) |>
summarize(avg_co2 = mean(co2),
avg_life_exp = mean(life_exp)) |>
ggplot(aes(x = avg_co2,
y = avg_life_exp)) +
geom_jitter(na.rm = TRUE) +
geom_smooth(method = "lm", na.rm = TRUE) +
theme_bw() +
labs(title = "Averge CO2 Emisions and Life Expectancy per Country",
x = "Average CO2 Emisions (tonnes per capita)",
y = NULL,
subtitle = "Average Life Expectancy (years)")# Model:
project_lm <- project_data |>
group_by(country) |>
summarize(avg_co2 = mean(co2),
avg_life_exp = mean(life_exp))
# Coefficients:
project_lm <- lm(avg_life_exp ~ avg_co2,
data = project_lm)
broom::tidy(project_lm) |>
knitr::kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 41.149 | 0.404 | 101.908 | 0 |
| avg_co2 | 1.806 | 0.155 | 11.641 | 0 |
project_lm |>
broom::augment() |>
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() +
geom_abline(slope = 0,
intercept = 0,
color = "steelblue",
linetype = "dashed",
lwd = 1.5) \(\widehat{y} = 41.149 + 1.806x\)
For every additional one ton per capita of CO2 emissions, we can expect an additional year in life expectancy.